GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Table 1
9タスク
All tasks are binary classification, except STS-B (regression) and MNLI (three classes).
Single-Sentence Tasks
CoLA
Corpus of Linguistic Acceptability
SST-2
Stanford Sentiment Treebank
Similarity and Paraphrase Tasks
MRPC
Microsoft Research Paraphrase Corpus
STS-B
Semantic Textual Similarity Benchmark
QQP
Quora Question Pairs
Inference Tasks
MNLI
Multi-Genre NLI corpus
NLIとは natural language inference(推論)
QNLI
RTE
Recognizing Textual Entailment
WNLI
Winograd Schema ChallengeをNLIとしてrecast
We evaluate baselines that use ELMo (略) as well as state-of-the-art sentence repre- sentation models